Analyzing weak lensing of the cosmic microwave background using the hkehhood 

function 

Christopher M. Hirata^ and Uros Seljal|] 

Physics Department, Princeton University, Princeton, NJ 08544 
(Dated: September 24, 2002) 

Future experiments will produce high-resolution temperature maps of the cosmic microwave back- 
ground (CMB) and are expected to reveal the signature of gravitational lensing by intervening 
large-scale structures. We construct all-sky maximum-likelihood estimators that use the lensing 
effect to estimate the projected density (convergence) of these structures, its power spectrum, and 
cross-correlation with other observables. This contrasts with earlier quadratic-estimator approaches 
that Taylor-expanded the observed CMB temperature to linear order in the lensing deflection angle; 
these approaches gave estimators for the temperature-convergence correlation in terms of the CMB 
three-point correlation function and for the convergence power spectrum in terms of the CMB four- 
point correlation function, which can be biased and non-optimal due to terms beyond the linear 
order. We show that for sufficiently weak lensing, the maximum-likelihood estimator reduces to 
the computationally less demanding quadratic estimator. The maximum likelihood and quadratic 
approaches are compared by evaluating the root-mean-square (RMS) error and bias in the recon- 
structed convergence map in a numerical simulation; it is found that both the RMS errors and bias 
are of order 1 percent for the case of Planck and of order 10-20 percent for a 1 arcminute beam 
experiment. We conclude that for recovering lensing information from temperature data acquired 
by these experiments, the quadratic estimator is close to optimal, but further work will be required 
to determine whether this is also the case for lensing of the CMB polarization field. 

PACS numbers: 95.75.Pq,98.62.Sb,98.80.Es 



I. INTRODUCTION 



Gravitational weak lensing of the cosmic microwave background (CMB) has been recognized as a potential indicator 
of large-scale structure in the universe. Compared to galaxy surveys, weak lensing has the advantage of directly 
tracing the matter density, thus avoiding the uncertainties associated with the relationship between the distributions 
of galaxies and of mass Because the CMB is the most distant background object that can be used for weak 
lensing studies, it probes the matter distribution at higher redshifts than can be reached by galaxy weak lensing and 
is sensitive to the largest observable scales in the universe [|l|, ||, ^, ^, ||] . 

In addition to providing data on the power spectrum of density fluctuations on these large scales, CMB weak lensing 
may yield constraints on the expansion history of the universe by making possible a measurement of the integrated 
Sachs- Wolfe (ISW) effect. The ISW effect (the change in temperature of the CMB radiation as it passed through 
a changing gravitational potential) is smaller than the primary CMB fluctuations produced in the early universe 
and consequently can be detected only through the cross-correlation of CMB observations with some tracer of the 
gravitational potential. Because it is sensitive directly to the potential, weak lensing is an ideal candidate for this 
cross-correlation . 

Because detection of CMB weak lensing may be possible with near-future satellite experiments, such as Planck and 
possibly even MAP, several algorithms have been proposed for estimating matter distributions, power spectra, and 
ISW cross-correlations from CMB temperature maps. Some of these methods are based on local statistics, such as the 
products of gradients of the temperature field Q]. Recently Hu ||, D, working to linear order in the deflection angle, 
determined the optimal quadratic estimator (i.e. quadratic in the CMB temperature map) for the deflection field. 
Within this linear approximation, the corresponding power spectrum estimator makes full use of the information in the 
CMB four-point correlation function ^, ^. However, the limits to the validity of the linear order approximation have 
not been well-determined, and the possibility of obtaining more information on lensing from higher-order correlation 
functions has not been studied in detail. Neglect of nonlinear terms may also create a bias in the quadratic estimators 
of the power spectrum. The nonlinear terms may be important whenever the deflection angle is comparable to the 
scale of CMB fluctuation used in the reconstruction of lensing potential. The deflection angle is of the order of several 
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arcminutes and for high resolution experiments significant amount of lensing information comes from CMB modes on 
the same scale, indicating that the nonlinear terms may be important. In order to address these issues, we use the 
likelihood function to construct estimators rather than assuming an estimator with a particular form (local, quadratic, 
etc.) and avoid linearizing in the deflection field except to compare our results to previous work and where necessary 
for computational tractability. 

We work principally in position space rather than harmonic space. This is done partly because real data are obtained 
in position space, and partly to show how the harmonic-space estimators Q can be derived from position-space 
arguments; also, the generalization of the position-space analysis to anisotropic instrument noise is more transparent. 
We also do not consider the reconstruction of matter distributions from CMB polarization; although polarization 
can theoretically yield much better information about lensing than CMB temperature fluctuations iQ, it is also 
computationally more demanding, so we defer a more careful analysis to a future work. 

We will proceed as follows: Section || introduces our formalism and notation, and defines the basic mathematical 



II] considers the likelihood function for the CMB and 



operations that will be used in the rest of the paper. Section 
its dependence on the lensing potential (the potential that generates the deflection field). In Section IV we consider 
the maximum likelihood estimators for the power spectrum of the lensing potential and its cross- corr elation with 
the CMB. In Section 0, we describe our numerical implementation of the estimators from Sections |lll| and |IV|; t he 
performance of the estimators, as determined numerically, is described in Section VE. We conclude in Sectioii|Vl|. 



II. FORMALISM 



A. CMB 



The cosmic microwave background temperature fluctuation in a particular direction n on the unit sphere is 
defined by 6(n) ~ T{n)/To — 1 where T(n) is the CMB temperature in direction n and ro=2.72 K is the mean 
temperature of the CMB. This temperature fluctuation can be expressed in harmonic space as 



oo I 

®(") = E E (1) 

where the Yim are spherical harmonics and are the corresponding coefficients. The spherical harmonics are 
orthogonal and are normalized so that their squared amplitude integrates to one over the sphere: l^zmM^^ = li 
and the transformation of equation (|^) can thus be inverted as: 



e,™ = / d2ny^*^(n)0(n). (2) 

s 



Because the statistical average {&im) = 0, we extensively use the power spectrum. The power spectrum is defined for 
a statistically isotropic temperature fluctuation as the variance 



{Q:,^,eir,r) ^Cf'^SwSram'- (3) 

For gravitational lensing work, we distinguish three temperature fluctuations: the unlensed temperature fluctutation 
O; the lensed temperature fluctuation Q; and the measured temperature fluctuation 0. Throughout this paper, we 
will take the primary (unlensed) anisotropy to be a Gaussian random field. The measurement is related to the 
actual temperature fluctuation by the instrument noise, e: 



e(n) = e(n) + e(n). (4) 

We assume that the instrument noise e is independent of O. 

Occasionally we will use the flat-sky approximation, in which a map Q can be expanded in Fourier modes, 
0(n) — X^i The Fourier modes are normalized over an area of 47r, and populate the 1-plane with 

a two-dimensional density of I/tt; this ensures that the flat-sky and all-sky normalizations are consistent on small 
scales. 
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B. Lensing 

Gravitational lensing of the CMB by scalar perturbations can be expressed in terms of the lensing potential $, 
defined by 

e(n) = e(n + V<I>(n)), (5) 

where V is the 2-dimensional gradient operator on the unit sphere. The lensing potential <I> is the projected gravita- 
tional potential along the line of sight (see the Appendix for details), 

<J'(n)^-2£^..^(.n,-.)(^-^), (6) 

where ris is the comoving distance to the last-scatter surface, ^(x, t) is the gravitational scalar potential at comoving 
position X and conformal time r, and T(r) is the tangentlike function (tanr, r, or tanhr depending on whether the 
universe is closed, spatially flat, or open). The convergence k = — iV^$ is positive when structures along the line of 
sight act as a converging lens (i.e. when they magnify the CMB) and is negative for a diverging lens. Conceptually, 
we would thus expect k to be a measure of the projected density perturbation; as shown in the Appendix, this is 
indeed the case. We define the power spectra C** and C/*'* = P(Z -I- l)^C**/4, and the cross-correlation Cf *, in 
analogy to equation 

We will in several instances require use of the lensing operator A that performs the operation in equation (^) : 

A[$]e(n) = e(n + V$(n)). (7) 
On occasion, we shall refer to the linear approximation to the lensing operator: 

e = AG w e-f-ve- v$. (8) 

Note that we have used Q to represent the lensed CMB ternperature and Q to represent the unlensed temperature; 
some authors have used this convention ||] , while others |l[ ||, g, [l^ have used 9 for the lensed and 9 for the unlensed 
temperature. 

C. Convolutions and Integrals 

A convolution of a function 9 on the unit sphere with kernel C is written as 

C9(n) = / d2n'C(n,n')9(n'), (9) 

JQ. 

where f2 is the region in which we have data, and the kernel C can be decomposed in multipoles using the Legendre 
polynomials: 

We will also need to take the inverse operation C^^ such that CC^^9 = 9. In the case of a true full-sky experiment 
(i.e. one that acquires usable data over the full 47r steradians), the operation is trivial: we apply a convolution 
with a kernel with multipoles (C~^)/ = (C/)~^. The inversion is more difficult on a portion of the sphere, as 
discussed in Section 0. 

Finally, we make use of the notation derived from linear algebra: our "column vectors" are functions on O, and our 
"matrices" are linear operators on this set of functions: Av{yL) = j^A{^x.,y)v{y)d?y. Example uses of this notation 
are u^v — J^uvcPn and A'^{x,y) = ^(y,x). 
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III. LIKELIHOOD ANALYSIS 



We analyze the likelihood function for gravitational lensing because this function retains all of the information 
provided by the observations. In particular, we can compare the "optimal" maximum likelihood estimators (MLEs) 
to previous results. We examine the relationship between the quadratic estimators and the likelihood-based estimators 
and the criteria for their equivalence, i.e. for optimality of the quadratic estimator. 

We will see that the lensing potential $ is detectable because its presence breaks spherical symmetry and thus 
causes correlations between the different spherical harmonic modes of the temperature field O, i.e. it creates off- 
diagonal elements in the covariance = {QQ'^) when expressed in the spherical harmonic basis; this is manifested 
in real space by an anisotropic correlation function C®®(x, y). Since these off-diagonal elements are, in the linear 
approximation, proportional to the lensing potential $, we could take QQ^ as a crude estimate of the covariance 
and form linear combinations of the off-diagonal elements to construct an estimator for $; this is the essence of the 
quadratic estimator methods (In the presence of instrument noise we measure and not Q but the idea is the 
same.) Note that while some quadratic estimators (e.g. [Q) have been derived from considering the magnification and 
shear of small-scale CMB features by larger-scale lensing modes, in analogy to the weak lensing of galaxies, such a 
picture is not essential to the quadratic estimation framework - quadratic estimation is possible whenever the linear 
approximation to is valid. The likelihood method, while somewhat more involved, is useful to investigate for two 
reasons: first, unlike quadratic estimators, MLEs are guaranteed to be asymptotically efficient (i.e. it is impossible 
to achieve lower error than the MLE in the limit of an infinite amount of data); and second, the likelihood approach 
retains its validity even when higher-order [e.g. 0($^)] te rms in the covariance are important. 

This section will be organized as follows. In Section [II A, we introduce the likelihood function and its basic 



propertie s and give a formal expression for it. We maximize the likelihood function using the calculus of variations 
(Section IIIB) and proceed to show that within the linear approximation [equation (^)] the maximum likelihood 
estimator reduces to the optimally weighted quadratic estimator (Section III Cp. We examine our ability to reconstruct 
the primary CMB anisotropy Q in Section HID. We conclude in Sections |lll E and [II F by examining the limits of 
validity of the linear approximation. 



A. Likelihood Function 



Likelihood maximization is a generally applicable method to statistical estimation problems. A statistical estimation 
problem involves a data set, in this case the measured CMB temperature fluctuation 0(ni) at N points {rii, ..,n^r}, 
which has a probability distribution determined by a set of parameters, in this case the values of the lensing potential 
The problem is to estimate the unknown parameters <& from the observations 8. We represent the probability 
distribution by a density function P, which is related to the differential probability dH for obtaining temperature 
measurements between 8(ni) and 0(ni) -f- dQ{ni): 



dU^ P{e\^)dQ{ni)...de{nN). (11) 

The maximum likelihood estimation method simply selects the value of $ that yields the largest value of P, i.e. the 
value of that would have been most likely to generate the observed 8. While this method is very general and can be 
applied to a wide range of problems, maximum likelihood estimators (MLEs) are frequently very difficult to compute, 
as is the case here. 

For convenience, we will work not with the likelihood function but with its negative logarithm C, which is defined 
by the relation 



= -lnP(8|$). (12) 

If we assume Gaussian instrument noise of covariance C" , we find that for fixed $, 8 is a Gaussian random field with 
covariance 



C««[$] = A[$]C««A[$]^ + C", (13) 

where the transpose of the linear operator A is defined by A-^(x, y) = A(y,x). The probability density of 8 is 
then related to its covariance via the usual relation for a Gaussian: 
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p(e|$) 



==expi -^e^c^Q-ie 



(27r)^/Vdet C^^ 

Combining this with the definition (O) and the standard Gaussian probabihty density formula, we find that 



(14) 



£[$] = ie^(c^^[$])-ie + iindetc^^[$]. 



(15) 



In some cases, we will use a Gaussian random prior for in which case we will use the negative log posterior 
probability V in place of the negative log likelihood C. The Gaussian prior for <i> is 



P($|C**) = 



1 



(27r)^/Vdet C** 



1 



cxp --^^C 



(16) 



where TV is the number of pixels in the map. From this the negative log posterior probability can be determined (up 
to an irrelevant constant) to be 



7?[$;C**] -lnP($|C** 
where C** is the covariance of the prior for <i> 



ie^(Cee[$])-ie + i lndetC^^[$] + i$^(C**)-i$ + ^Indet C**, (17) 



B. Likelihood-Based Estimators 



We construct estimators using C and V by setting their functional derivatives with respect to <I>(n) equal to zero. 
Differentiating equation (|l5| ) gives 



2 (5$ 2 



(C««[$])- 



5$ 



Using equation (13), we calculate the functional derivative of C®®[$]: 



(18) 



<5C^e[$](y,z) _ ^,,^l^ee^,,, ,y^'5A[<I>](z,y'),2„, 



(5$(x) 

We differentiate A using equation (|^): 



d y + transpose. 



,(5V$(x') 



(5$(x)' '''''' 5$(x) 5V$(x') 

Using this relation and integration by parts, we convert equation (fol) into 



-(A[$]i;)(w) = (Vw^^'^(w - x)) • (A[$]Vi;)(A 



(19) 



(20) 



^CeQ[$](y,z) 
(5$(x) 



d2y'(A[<i>]C«^)(y,y')(V,<5(2)(z - x)) • (A[$] V)(z, y') + transpose 



{WJ^^\z~x.)) ■ (A[$]VC^®A[$]^)(z,y) + transpose. 



(21) 



We also express the trace as an expectation value using the identity Tr (X) — {uXC ^u) with u drawn from a 
Gaussian distribution of covariance C, and integrate by parts again to yield 



dC[<S>] 
<5$ 



= V • 



e(c^^[$])^iA[$]vc®®A[$]~\c^^[$])-ie 
V • [e(c*^^[$])-iA[$]vc®®A[$]-i(c®®[$])-ie 



(22) 
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The functional derivative of V differs by the addition of a C** ^ $ term. The maximum-hkehhood estimator $ for $ 
is then the solution to (5£/(5<i> = 0. If we then define the likelihood gradient G[$] by: 



(Cee[|,]-ie)A[l>]VC®®A[$]-i(C®®[l>])-ie 

(23) 



■(Cee[$])-ie)A[$]VC®^A[$]-i(C®®[$])-ie 
then the maximum likelihood estimator becomes; 

G[$] = 0, (24) 
whereas the mode of the posterior probability distribution (i.e. maximum of e^^) is the solution $ to: 

$ = -C**G[$]. (25) 

In deriving equation (|23|), we have dropped boundary terms. In our implementation (Section ^ we simply do not 
work near the survey boundaries, however, the formalism can be generalized to include these by setting C"^"^ = oo in 
the unscanned regions. This does not cause numerical difficulties because the inifinte eigenvalues of C^^ (and hence 
C^^) become null eigenvalues of C^^-^ 

As a final note, the expectation value in equation (p^, which derives ultimately from the determinant in the 
Gaussian probability density, is small when the noise is small {C^^ <C G^®) and we are far from the boundaries of the 
region of sky surveyed. This is because substituting the zero-noise limit for G) and C*®® into the expectation value 
converts it into 

A[$]([((G®®)-ie)ve]). (26) 

We next note that for a statistically isotropic unlensed and an all-sky survey, the expectation value in equation ( [2^ ) 
must vanish because it is a 2-vector (i.e. a vector on 5^) and hence a nonzero value would pick out a preferred direction. 
Near a boundary of the surveyed region, this argument fails because the boundary breaks rotational symmetry. The 
expectation value in ( p3| ) thus acquires a nonzero value only in the presence of instrument noise and boundary effects. 
Conceptually, we understand this as a property of equation (p^): noise adds the C^"^ term to G®®, while boundary 
effects alter the unit determinant of A. Without these effects, detG®® — detG^® = const, and the expectation value 
in equation (|2^), which is merely a derivative of the log-determinant, vanishes. We further note that modes with large 
noise (G^^ ^ G^®) do not contribute to G because of the C^^~^ which appears twice in equation ([2^). Since most 
CMB experiments have only a small range of I for which G" and G^® are of the same order, and it is only in this 
regime and near boundaries that the expectation value in equation (|2^) is important, we will neglect the expectation 
value in the remainder of this paper. That is, we approximate: 

G[$] « V • [(G®^[$]-ie)A[$]VG®®A[$]-i(G^®[$])-iej . (27) 

C. Linearized Version of MLE 

In order to connect equation ( p^ ) to previous work on quadratic estimators, we approximate the right hand side of 
the equation to linear order in $: G ~ Go + F^, where the likelihood gradient G is computed from equation ([27|), F 
is the matrix of second derivatives i5^£/(5<I>(5$ (independent of $ in the Gaussian approximation), and Go is equal to 
G evaluated with no lensing: 



Go«V- 



e(G®® + G")-iVG®®(G®® + G")-iel . (28) 



In order to obtain a quadratic (rather than merely a rational) estimator, the approximate curvature matrix F must be 
taken independent of 8. We will therefore replace it by its expectation value (F) averaged over G). This expectation 
value is: 
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(29) 



where m and n are the matrix indices of the matrix of second derivatives. We may compute the last integral by 
requiring that the probability distribution for 8 be properly normalized: 



1 = J PeP(e|$) = j POe^^f®!*'. 
Taking the second derivative of this equation with respect to $ gives: 



(30) 







(31) 



which enables us to rewrite equation (G^ as: 



F„ 



(32) 



Thus the matrix of second derivatives is simply the covariance of the likelihood gradient. (Note: we will call the matrix 
of second derivatives the Fisher matrix even though the technical definition of the Fisher matrix differs from F for a 
non-Gaussian likelihood function.) We choose to evaluate _F at $ = (no lensing) for convenience, although within 
the Gaussian approximation F can be evaluated anywhere. At $ = 0, equation (|27| ) for G simplifies dramatically and 
we have: 



V • 



(Cee-ie)VC®QC^^-ie] {V- [(cee-ie)vce0c^^-i 



e 



(33) 



which is recognizable as a four-point correlation function of 0. If we switch to the flat sky approximation, and assume 
the noise is isotropic, and C"^*^ become diagonal in Fourier space. Then we can compute the four-point correlation 
function for Gaussian O using Wick's theorem: 

(©llQlaQlsQu) = C'h^Qf ^'5ll+l3:0<5l2+l4:0 + Q^^Qf ^<5li+l4,0<5l2+l3,0 + C'u^Qf ^'5l2+l3:0<5li+l4,0- (34) 

This gives the result for F: 



Ft, 



1 

8^ 



ll+l2=L 



[L.(liC[^e+l.Cg^)]^ 

(Cee + c,Y)(Cee + c'4.)' 



(35) 



Equation ( |3q ) is recognizable (apart from a factor of Att/L'^ due to normalization convention) as the noise variance 
and optimal weighting derived by Hu We use the flat-sky approximation only to c ompu te the noise curves in 
Figure ||; in our simulations we will evaluate F via a Monte Carlo technique (see Section V B ) . 
We can then construct the maximum likelihood estimator for $ under this approximation. 



MLE 



^F-^Gq ^ -F-'^W 



e(c«« + c")-^vc««(c«« + eye 



(36) 



and the corresponding approximate mode of the posterior probability density: 



-((c**)-i + f)-'^Gq = -((c**)-i + F)-iv • e(c®® + c")-ivc®®(c®® + c")-ie 



(37) 



Both of these are recognizable as quadratic estimators, i.e. they are second-order polynomials in Q. By spherical 
symmetry, if Q is statistically isotropic then the vector quantity in brackets, and hence <i>, will have expectation value 
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zero. Thus equations ( |36| ) and ( p7\ j are measuring the deviation of Q from statistical isotropy that arises from lensing 
by a potential $. These deviations from statistical isotropy in position-space appear as correlations between different 
spherical harmonic modes in harmonic space; see 10] for the associated harmonic-space estimator. It can be shown ||] 
that within the linear approximation, equation (|36| ) provides an unbiased estimate for the lensing potential $ when 
averaged over an ensemble of primary CMB anisotopies Q. 

Having determined these approximations, we consider the conditions of their validity. The linearization of the r.h.s. 
of equation ( pO| ) clearly corresponds to a Gaussian approximation to the likelihood function, with the second-order 
Taylor expansion of C carried out around $ = 0. This can be expected to be valid when the maximum likelihood 
point is "near" $ = in the sense that <f> ^ C"/C"' (where the ' denotes a functional derivative with respect to 
$). Therefore it would be reasonable to expect that the estimators in equations (^6|) and ( |37| ) break down when the 
lensing effects become large, i.e. when C** becomes sufficiently large. We analyze this possibility analytically in 



Sections [II E and III F and numerically in Section V E 



D. Reconstructing the Primary CMB 



We next wish to reconstruct the primary (unlensed) CMB Q from observations Q of the lensed temperature field. 
Because our instrument gives us one function on the sky, 0, it is not in general possible to simultaneously reconstruct 
the primary CMB anisotropy, 8, and the lensing potential If the CMB and lensing power spectra are given, 
however, we can use the power spectra as a prior and construct a Bayesian posterior probability distribution for 6 
and $ and maximize it. While this docs not permit determination of the primary anisotropy to arbitrary accuracy, it 
is the best that one can hope for if only the lensed CMB temperat ure is ava ilable . The determination of the lensing 
potential and primary CMB power spectra is discussed in Sections [V B and IV E . 



To estimate the primary CMB anisotropy 8, we take a Gaussian prior for both the primary CMB and the lensing 
potential. This gives us a joint posterior probability distribution for 8 and $ of e~^, where TZ is given (up to an 
additive constant) by 



1 



7^[8, *] = -e^ic^^r^e + -$^(c**)-i$ - inP(e|8, $) 



(38) 



where ^(818,$) is the conditional probability of observing temperature 8 given a primary CMB temperature 8 
and lensing potential <&. It is readily noted that P is simply the instrument noise curve, which we take to be Gaussian: 



lnP(8|8,$) 



1(0 



A[$]8)^(C")-i(e- A[$]8). 



(39) 



Equations ( |38D and (|39|) formally express the joint posterior probability distribution for 8 and $. In order to 
reconstruct the primary CMB, we integrate out the lensing potential to find the negative log posterior probability 
distribution TZ for 8: 



-i(e - A[$]8)^C"-i(e - A[$]8) - l8^C®®-i8 - 



(40) 



This equation is difficult to evaluate. In the linear approximation, however, we may replace A[$]8 with 8 -I- V<f> • V8; 
this makes the integral Gaussian, so it can be evaluated analytically to give 



-^[8]^(C' 



$$_i 



-e'c 



2' 



+-(8-8)^C' 



8 



(41) 



where 



g[8](x)=V- (C7"-i(e-8))V8 (x) 



(42) 
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and 



^[e](x,y) = (V..)(Vy)(Ve(x)C— i(x,y)Ve(y)). 



(43) 



Note that we have used integration by parts to write ^[6] and Q[&]. The matrix J-[0] is manifestly symmetric and 
can be seen to have all nonnegative eigenvalues as follows: if we take any real map X(x), then 



X' T[e]x = (ve • vxy c"~i(ve • vx) 



(44) 



Since the inverse noise matrix C"^*^"^ is symmetric and positive-definite, this quantity must be nonnegative. Indeed, 
this can only be zero if V0 • WX — everywhere, that is, if X is constant on flows of V0. If we have all-sky 
coverage and Q is well-behaved, then all of the flow curves of V0 connect at the maxima, minima, and saddle points 
of 0, consequently in this case J- is positive-definite except for the constant / = mode. Consequently, the matrix 
(j^<s>-i _|_ jrj'Qj jjy^jgi positive-definite, which is required for (|4ll ) to make sense as a probability distribution (this 
was also implicitly assumed in doing the Gaussian integral). We next define the (not symmetric!) matrix H[Q] by 



H[e]x = ve • vx 



(45) 



so that T[e] = HiQfC'-^Hie]. We can see, using integration by parts, that ^^[6] = H[efC-\e - 6). 

To make further progress, we use equation (^) to rewrite the first term on the right-hand side of equation (p|): 



c") ^(e-e) 



+ie7'(cQQ)-ie + 1(9 - efc''-\e - ©) + ^ indet(c**-i + .^[e]). 



(46) 



Even this equation is too complicated to be useful in this form, so we will make the replacements H[Q] H[Q] 
and F[0] F[Q]. This converts equation ( p] ) into a Gaussian posterior probability distribution. The peak of the 
posterior probability distribution is 



e 



PEAK 



and its covariance (i.e. inverse-curvature) is 



e. 



ee-i 



(47) 



Gov [6] = 



-C 



ee-1 



(48) 



It is instructive to compare equation (^) to other means of estimating 8. Note that determination of 8 is a 
nontrivial task since both lensing and instrument noise must be taken into account. In the limit that lensing is 
negligible ((7** — > 0), we derive 8peak ~ {C^^~^ -|- C®^~^)~^C"^'^~^8, which is recognizable as a simple Wiener filter 
of 8. In the opposite limit, where instrument noise is negligible compared to the effects of lensing (i.e. C"^"^ "^0)? we 
derive 



via a first-order Taylor expansion in Substituting this into equation ( p7| ) yields 

8pEAK = (i/ [e]^-ic**-iii[e]-i 4- j ij[e]^-ic**-iii[e]-ie. 



(49) 



(50) 
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which is recognizable as a Wiener-filtered temperature map with iJ[(9]C**7J[0]^ playing the role of the noise co- 
variance. This is not surprising since if [8]C**i/ [G)]"^ is the covariance of the temperature change due to lensing, 
& — &, and under our assumptions the correlation between Q — @ a nd Q vanishes. Further simplification is possible 
by noting that, for zero noise, the likelih ood gradient Go of Section III C may be written as Go = —H[Q]^C^^~'^Q. 
Then the Fisher matrix of Section III C can be approximated as 



F = (GoG^)*=o = (i/[e]^G««-iee^G««-iii[e]) = (ij[e]^G®«-i(ee^)G««-iij[e]) 

= {H[@fC'^^'^H[e]) « HiefC'^^^-'HlQ]. (51) 

(The last equality on the first line is justified as follows: since H[Q] is a linear function of 0, and is a 
Gaussian random field, Q and H[Q] are jointly Gaussian. Thus the expectation value of the four-point function 
HlOf^ C^^^^QQ'^ C^^~^ H[Q] can be expanded using Wick's theorem as a sum of three terms, each of which is a 
product of two-point functions. Since (Go) = 0, the final expression on the first line of equation (^ij) is the only nonva- 
nishing term.) If we further assume that the lensing effect can be treated as a perturbation on the background CMB 
- an assumption that we have made already through the linear approximation - we can approximate H[0] ~ ^^[0], 
which allows equation ( ^o|) to be rewritten as 

OpEAK = H[Q] (G**-i + py^ G**-ii/[e]-ie = e - H[e] (g**-i + ry' FH[e]-^e 

= e - H[e] (G**-i + Fy^ f [ei^G^^-^e. (52) 

Using equations ( p8| ) and ( p5| ) and integration by parts we see that H[Q]C^^~^0 = — Gq. Also, comparison of 
equation ( ^5| ) to the lensing operator definition, equation (^, indicates that in the linear approximation, A[<I>i]Gi = 
81 + ii[6i]$i. This allows us to simplify equation ( p2[ ) to 



epEAK = e + ii[e](G**-i + i^) 'Go = a[(g**-i + f) 'Go]e. (53) 

This is the observed temperature map "corrected" for lensing using the Wiener-filtered potential map, equation (|3 
It is thus the temperature analogue of the approach used in ||l2[ for reconstructing primary polarization. 



E. Onset of Nonlinearity 



We examine the validity of the linear approximation leading to equations (36) and (|3^) using the real-space Taylor 
expansion of the lensing formula, equation (^: 

e = AO = e + v$ • ve + iv$v$ : we + o($^). (54) 

Quadratic estimator was constructed based on the first-order (i.e. order $^) effect of lensing on G®®, which neglects 
the second-order and higher terms in equation (jsj) , as well as the covariance of the first-order term. Thus we expect 
that this approximation will be good if the ratio of successive terms R in equation (|5^) is small. As a simple (and 
naive!) first approach to determining when the linear approximation is valid, we note that if L denotes the typical 
multipole of <&, and I denotes the typical multipole of Q, then R — Ll^. Since the mean square value of $ is roughly 
L^Cf"^, we find that R^ « L'^Cf'^P, so the linear-order approximation breaks down at G** > L~H~'^ . Given that 
L'*G** has a maximum of approximately 10~^, we would then conclude that nonlinear effects could become important 
at I > 1000, i.e. Planck (Zmax ~ 1600) and higher-resolution experiments might be susceptible to these effects. 

A more refined version of this analysis would examine the covariance G®® (x, y) of the observed temperature instead 
of simply the temperature fluctuation. This is because for the long-wavelength lensing ($) modes, the second-order 
($^) corrections to the covariance are significantly less than calculated by the naive method above. Conceptually, 

one can understand this by noting that, because the primary CMB is statistically isotropic, G®® is sensitive to the 
relative, not absolute, deflection of photon trajectories. In the flat-sky approximation we have: 

G«^(x, y) = G"(x, y) + G««(x - y) + (V<I'(x) - V$(y)) • VG««(x - y) 

+ i(Vi>(x) - V$(y))(V$(x) - V$(y)) : VVG®^(x - y) + 0($3). (55) 
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We note that if V<I> is slowly varying compared to the separation of x and y (that is, <C 1 where 7 = |x — y|), a 
near-cancellation occurs between the linear terms in equation (|55|). This cancellation reduces the squared expansion 
parameter from L^C**P to j^L^Cf'^P. Since 7 can take on a wide range of values from the scale of the lensing 
mode, L~^, down to the limit of the instrument's resolution /"^xj is not at all clear how to proceed analytically 
with this approach. 

F. Bias of quadratic estimator 

Another way to measure the importance of nonlinear terms is to compute the bias in the quadratic estimator 
[equation (^6|)] for <i>, over an ensemble of primary CMB anisotropics O and instrument noises e with the same lensing 
potential $. This bias vanishes in the linear approximation [||. It can be computed by noting that the expectation 
value of ^cqds^) is a linear combination of covariance matrix elements of C®®. We first switch to working in Fourier 
modes on a flat sky; in Fourier space, the two- mode correlation function of the observed temperature is given by the 
Fourier transform of equation (|55|): 

(61,61,) = (CiY + Ci^®)5i,+i,.o + -^(ii + 12) • (iiG, + hCi,)<Pi,+i, 

v47r 

+ ^ E ^k.'&k, [(kl • ll)(k2 • + (ki . h){k2 ■ l2)C«« - 2(ki . J)(k2 • J)Cf «] (56) 

ki 

where we have set k2 = li + 12 — ki. Then we may use this two-mode correlation function to evaluate the expectation 
values of equations (^8|) and hence the quadratic estimator (^6|): 

(($cq(|))e,e)L = *L + ^^^^^<i>k,<i>k. 

kl 

• J2 [(kl • ll)(k2 • + (kl . l2)(k2 • l2)C«« - 2(ki . J)(k2 • J)Cf , (57) 

ll 

where we have set I2 = L — li, k2 = L — ki, and J = ki — li, and: 



•-'^ (Cee + c«)(Cee + C|;.)- 

We can then compute a mean-squared bias by squaring the bias and ensemble- averaging over <I>. (Note that since 
different Fourier modes of $ are uncorrelated, the terms in the sum over ki will usually add incoherently. The 
exception to this rule is that terms related by switching ki and k2 are equal, so the mean squared value of the sum 
is double the value obtained by summing the mean square of every term.) This mean-squared bias is then given by a 
quadrilateral integral: 



(|<5<fLp)^- (|((<i'cq(g))e,e-<i>) 

1 



512^ I ^ ^'^'^ 



y d^iiTi.a. [(ki.li)(k2-li)Q^«-H(ki.l2)(k2-l2)qf^-2(ki.J)(k2-J)Cr]| , (59) 

We can thus construct a nonlinearity parameter R2 that is the ratio of the RMS bias to the RMS value of the lensing 
potential: i?| = (I'^'I'iP)^ /C**- This nonlinearity parameter is plotted as a function of L for three experiments 
in Figure 0; the parameters for the three experiments - MAP 4-year data, Planck, and a future high-resolution 
experiment - are shown in Table ||. The R2 nonlinearity parameter is small for all but the high-resolution experiment, 
indicating that the bias in the quadratic estimator [equation (^6|)] is small. 

As a final means of testing the importance of the higher-order terms in the expansion of C®®, we conduct a 
numerical "experiment" in Section^ that compares nonlinear [equation (p5|), without the small and computationally 
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Nonlinearity parameter Rg for different experiments 
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FIG. 1: The dimensionless nonlinearity parameter R2, equal to the ratio of RMS bias in the quadratic tensing potential 
estimator [equation (^)] to the RMS value of the potential, is plotted here for several experiments as a function of multipole 
(wavenumber) . Note that this quantity is less than unity for all of the experiments. See Table | for experiment parameters. 



difficult expectation value] and linear [equation (|37|)] lensing potential estimators. There we find only modest (10- 
20%) improvement in the RMS error of the lensing potential reconstruction even for a high-resolution (1 arcminute 
beam) experiment. 



IV. POWER SPECTRUM ESTIMATION 



Having constructed an estimator for the lensing potential $, we next consider its power spectrum C**. Concep- 
tually, the situation here is more complicated because once we average over an ensemble of lensing potentials derived 
from the same power spectrum, the lensed temperature field 9 is once again statistically isotropic with (96'^) diag- 
onal in harmonic space. (That is, the off-diagonal elements average to zero since ($) = 0.) But we can still construct 
an estimator for (7** = ($$^) by taking the quadratic estimator for $ and computing its "square". The resulting 
power spectrum estimator is thus constructed from the four-point correlation function of 9 or (in the presence of 
noise) 9. It is thus measuring deviations of 9 from Gaussianity. We will show that in the linear approximation, the 
maximum likelihood estimator reduces to the quadratic estimator. 

We begin this section by formally writing out the likelihood functio n for the lensing power spectrum (7** as an 
integral, and then approximating this integral as Gaussian in Section IVA. In Section IV B, we approximate the 
curvature (inverse-covariance) matrix of this G aussia n in order to obtain a maximum likelihood estimator that is 
computationally tractable. We show in Section IV C| that within the linear approximation, the MLE and quadratic 
estimator are equivalent. Computation of the primary CMB power spectrum (7^® is considered in Section [VD, and 
cross-correlations, e.g. C^*, are considered in Section tVE. 
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A. Likelihood Function and Gaussian Approximation 

In principle, we could estimate the power spectrum C** by constructing a grand likelihood function L given (up 
to an additive constant to £, or equivalently a multiplicative constant to e^''') by 



C = -\nJ P$P($|C**)e-^ = - In J I?$exp (^-C[<^>] - i^'^C**-^*^ = - In J P^e"^, (60) 

where the integral over I?$ is a functional integral with the usual measure on where N is the number of pixels in 
the map. Note that £ is a function of the covariance C**; however, we cannot simply maximize the grand likelihood 
function, equation (^0|), and thus obtain an estimate of C**, because our map provides us with N real observations 
whereas C** has A^(A^+l)/2 independent parameters. In order to obtain a meaningful result for the power spectrum, 
we must restrict the form of C**. Fortunately, the spherical SO{3) symmetry of the sky provides us with just such 
a restriction - it forces C** to be diagonal in /-space. We will thus assume in this section that C** can be written 
as a linear combination, 



C**(x,y)^^C7rC(x,y), (61) 

a 

where the a's are indices labeling the basis covariance functions and we wish to evaluate the coefficients C**. There 
are two interesting choices of basis function Ca- The first is the Legendre polynomials, which span the space of C*** 
that are consistent with symmetry requirements. These basis functions are given by 



2/ + 1 

CKx,y)=.^Pz(x.y). (62) 

This results in coefficients C** that are the power spectrum of $. The other choice, useful in the low-SNR case, is to 
add several functions of the type equation ( |6^ together to boost the overall SNR, i.e. to estimate the lensing power 
spectrum in a band rather than for each individual I. In this case the coefficient C** is a weighted average of the 
power spectrum over the range of I values covered by the basis function Ca . 

We have now set up the maximum likelihood estimation problem for C**. Before proceeding to compute the 
maximum-likelihood point, we warn the reader that there is no guarantee that the likelihood function is devoid of 
local maxima. Most of the methods described here cannot avoid local maxima, nor can they be readily adapted to 
detect local maxima. The exception is the Markov chain method, although the number of iterations required to escape 
from a local maximum may be prohibitively large. 



Since A'' is a large number (typically 10^-10^), brute- force integration of equation (60), does not appear feasible. 
There are at least two conceivable approaches to this problem: a Markov chain (MC) integration, or a Taylor expansion 
of the integrand. While the MC approach is dramatically faster than a brute-force integration, it is apparent from 
the high dimensionality (one dimension for each map pixel) of the problem that many iterations in the sequence will 
be necessary for convergence. We have not found a computationally feasible implementation of MC for this problem. 
The alternate approach is to Taylor expand V to quadratic order in $ around its minimum <I>min, i.e. to approximate 
the posterior probability distribution for $ as a Gaussian. This gives 



«-ln I P^exp en - ^($ - ^n.^nf ^^'^^^l'^f^''^ (^ ^ <i>n.in) 

« ^[$min, Cf* ] + i Indet ^Y^'g^'' ^ ^ n'^min, C^''] + - In dct K, (63) 

where the curvature matrix K (the matrix of second derivatives of V with respect to $, evaluated at $min) has been 
introduced and an irrelevant additive constant has been dropped. Using equation (|63|), we seek to minimize the grand 
likchhood function £. To do this, we differentiate the final result of equation (|63|), yielding 
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= 



_d£_ 



d 



dV 1 

9C** ^ 2 9C** 



■ In dot K 



d 



In det 



2 ac** ' 



(64) 



where the final (chain-rule) term reflects the shifting position <i>niin of the minimum as we change C**. There is no 
corresponding chain-rule term for V because at the minimum, dV/S^ vanishes. We may now evaluate the derivative 
of V, noting that only the prior term in equation ( p^ has a dependence on C**. Combining the log-determinant of 
C** from the prior with the log-determinant of K in equation (^) transforms (^) into 



= -i<in(C**)-C„(C**)-i$„.„ + lndet(C**if)k_ + ^ Ift ^^^li^ > *™ ' ^''^ 

At this point we are confronted with the difficulty of computing the curvature matrix K. Unfortunately, brute force 
computation of K requires 0{N'^) computations of V, each of which must require at least 0{N) elementary operations 
since it accesses N data points; in practice, a computation of V involves spherical harmonic transforms consisting of 
(3(7V3/2) operations. 



B. Approximating the Curvature Matrix 



In order to compute the curvature matrix in equation (|6^), we split it into two parts: the curvature of the likelihood 
function {F, which is the Fisher matrix in the Gaussian approximation) and the curvature of the prior, which is always 
(C**)-i: K (C**)-i. This provides us with the identity 

C**K = C**F+lwx7V, (66) 

where Inxn denotes the NxN identity matrix. We use the value of F from the Gaussian approximation: F ~ {GG^). 
If we are far from boundaries or regions of nonuniform noise, F is diagonal in harmonic space and we may approximate 
it in bins accordingly: 

F-i(x,y) « 5][i^-i]„C.(x,y) ee J] ^C4x,y), (67) 

OL a 

where the last equaHty defines the binned Fisher matrix F^. In this approximation, F = (GG^) reduces to 



Fa = ^ (G^Ca^G) , (68) 

da 

where the expectation value can be computed by a Monte Carlo analysis, and da is the number of lensing modes in the 
band covered by Ca- Technically it is best to compute the Fisher matrix at the value of $mini however for purposes of 
computational tractability we only compute it once at C** = 0. $min = [see equation (|9^)]. In this approximation 
SK/S^ vanishes so we will drop the final term in equation (|6g). We can then differentiate the log-determinant of 
C'^^K with respect to a power spectrum coefficient: 

lndet(C**/^) = Tr [(C**F + iNxN^CaF] = Tr [(C** + F^^'Ca] ■ (69) 
With this approximation, (B3) and (p9h give 



„(C**)-iC(C**)-i$„,,, ^ (70) 

where the denominator in the second term uses the C** value appropriate for the range of multipoles covered by the 
a basis function. 
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Linearization 



If the lensing is sufficiently weak, i.e. if we are in the linear regime (see Section [HE), and we are only using the <i>'s 
far from our boundary, we can solve equation ( [70| ) directly. To do this, begin by examining equation ( |70| ) in Z-space 
(assuming diagonality) : 

= ^ (71) 

where the sum is over the da lensing modes grouped into the band a. If we now take the multipole moments 
= (C**-i + Fa)-^Gotm given by (|^), we derive 

= -7 77? IGpimP - (72) 

Irn 

which is the same result derived by Hu Q in the flat-sky approximation. (It is valid in the all-sky approximation if 
we re-interpret F and Gq as all-sky variables.) 

D. Primary CMB Power Spectrum 

The grand likelihood function C defined in equation (|60| ) contains the complete dependence of the probability 
density of Q on the primary CMB power spectrum, C^®. Thus it can be simultaneously maximized over C*** and 
C®®. We first parametrize the primary temperature power spectrum C*®® in analogy to equation (|6l|): 

C««(x,y)=^CrW.(x,y), (73) 

a 

where the Wc's are the basis functions. Then we differentiate the Gaussian approximation, equation (|6^), with respect 
to the coefficients to determine the condition for maximization of the likelihood e~^: 

^ ^7'[$„,in,C**] + ilndetif 



dV 15...., ld<^>L,,S\ndetK 



lndetXk_ + -_|^^-_|^_. (74) 



We proceed in analogy to our analysis of the lensing potential power spectrum in Section IV B . We neglect the change 
of detK with $, thus eliminating the last term in equation ([74[). We can simplify the first term by noting that V 
consists of a prior and the unmarginalized likelihood C; the prior has no dependence on C^®, while the unmarginalized 



likelihood [given by equation (15)] has derivative: 
dC 



ie^C^^-iA[$]>VaA[$]^C®^-ie + ^Tr (c^^-^A['i>]WaA[^f) . (75) 



Combining this with equation d74), and using equation (|6q) to eliminate K in favor of F in the last term, gives us: 



= -^e^C^^-^A[<i>]WaA[<i>fC^^-^e + ^Tr (c^®-iA[$]>V„A[$]'^) + ^Tr 



(F + C 



■i>*-i\-i 



(76) 



One can readily see that in the absence of lensing, the final term in this equation vanishes, the A[<I>] matrices become the 
identity, and this equation reduces to the standard maximum likelihood result for CMB power spectrum estimation: 

= -ie^c^^-^WaC^^-'e + ^tt (c^^-'w^) ■ (77) 
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E. Correlation of Lensing With Other Observables 



We may want to compute the correlation of the lensing potential $ with some other quantity. Examples could 
include the CMB temperature 8, Sunyaev-Zel'dovich or X-ray observations of hot gases, or galaxy maps. Since the 
focus of this paper is on likelihood methods, and approximations to them, we will restrict our attention here to the 
case of determining Cf"^ where Z is an observable which has a jointly Gaussian distribution with $. This situation is 

expected to be a very good approximation for the CMB-lensing correlation Cf * introduced by the ISW effect, since 
ISW is expected to be apparent primarily on large scales which are still in the linear regime; some non-Gaussianity 
in the potential-induced Q fluctuations may be expected from nonlinear growth at I > 100 jT^ , but this should have 
negligible effect on the expected signal to noise. For the other observables, the situation is complicated by nonlinear 
evolution and the method described here should be used with caution. 

We will neglect any error in the determination of Z. This is not as restrictive an assumption as it might seem; if 
we wish to cross-correlate $ with an observable that has Gaussian error bars, we may write 



Z = Z + C, (78) 

where Z is the measured value of the observable, Z is the actual value, and ( is the error. If ( is Gaussian and 
independent of Z or and Z is jointly Gaussian-distributed with the lensing potential we infer the relations 



C^^^Cf^ + Cf^ and cr^Cr, (79) 

so that estimating the cross-correlation of Z and $ becomes equivalent to measuring the desired correlation C^'^ . We 
can then construct the likelihood function: 



£[Cr,C^*] = - In / 2?$exp 



(jZZ (jZ'S> 



(80) 



The estimators of Sections [II B and IVB need only minor modification in order to do a joint maximum-likelihood 
analysis of C** and C'^*. To see this, note that for a joint distribution with specified covariance, the expected value 
of $ given Z is 



E^Z] = ($)|z = C'^^C^^-^Z = AZ, 
where we have defined the slope matrix A = C'f^C'^^-i. The variance given Z is 



(81) 



= (($ - ($)|z)') = C** - (82) 

where we have used C^*(x,y) = (Z(x)<I>(y)). (Note that C*^ is the matrix transpose of C^*.) Equations ^) 
and (^2|) are general for any joint Gaussian distribution, hence they are valid here even considering the existence of 
boundaries. Using them, we can re-write the likelihood function [equation (^0[)] as 



£[Cf*,Cf*] = -Z^C^^-iZ-ln j I?$exp|-£[$]--(cI>-£;[a>|Z])^[C**U]-\<I>-i?[<f>|^])j, (83) 

which is of the same form as the first (exact!) line of equation (|60|). The additive constant \;Z^ C'^^^^ Z has no effect 
since we take Z and to be constant, so the estimators developed earlier in this paper to compute $ and C** can 
be re- written to compute $ — i?[<I>|.Z] and C**!^, respectively. We next construct these estimators before turning our 
attention to the problem of estimating the slope matrix A that relates Z to 

If we are sufficiently far from a boundary, we can diagonalize in harmonic space to yield 



E\^Z\ = 



AlZ 



and 



C |z = C (l-Pi ), 



(84) 
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where pf * is the correlation coefScient of the rth-order multipoles. We note at this point that (assuming C^^ is 
known or has been separately measured) that the sets of variables (C**, C,;^*), (C**,pf*), and (C**!^,^/) are 
merely different parametrizations of the same space of models. We can estimate any of these pairs; (C**|2,Ai) is 
introduced here precisely because it is the easiest to estimate directly. We can now immediately convert equation 
(p5|), without the trace, to yield 



$ = S[$|Z] + C**|zG (85) 

as the mode of the posterior probability distribution [where G is the likelihood gradient as specified in (^)]. The 
power spectrum estimator result, equation (|70|), becomes 

1 = " " G^CG, (86) 



where G in equation (|8^ ) is evaluated at the solution to equation (|8^). These equations specify the conditions for the 
likelihood £(G**, p'^'^to be stationary with respect to first-order variations in G**!^ with Ai constant. In order to 
complete the analysis, we must also identify the condition for £ to be stationary with respect to first-order variations 
in the Ai (slope) coefficients in equation ( p4[ ) with G**|z constant. For these variations, if we again approximate the 
Fisher matrix as F = (GG"^)|$=Oj we derive a constant curvature matrix K. Then, parametrizing the Ai in bands in 



analogy to equation (61) gives 

A(x,y) =^A„S„(x,y), (87) 

where it is assumed that Ba and hence A are symmetric. We obtain a maximum likelihood condition on A^ by 
differentiating L: 

dC _ dV _ dC f ^ SC 9$(x) _ f ,2 SC dE[<i>\Z]{y:) _ j. 



dAot dAa dAa Jfi (5$(x) dAa Jfi (5$(x) dAa 

Note that we have taken $ — £'[(f>|Z] to be constant here; this was merely a convenient choice. [Since we have 
maximized V with respect to $, we can choose any first-order variation in $ without affecting the derivative in 
equation (Bsl).] It follows that the joint maximum-likelihood estimator for (G**,^) satisfies 



Z'^BaG = 0. (89) 
We can then reconstruct the full lensing power spectrum and cross-correlation using the relations 



Cf* = Gf^A, and Gf* = A2cf^ + Cf*|z. (90) 

If we take the line a.r app roximation of Section HIE, that G ~ Go + F^, and approximate diagonality in harmonic 



app 

space as in Section [IV Q , equation ( ^9] ) becomes: 



Z^Ba^FB^Z- ' 

If we note that G^* = C^^ Ai, we note that this is the same result as obtained by correlating the linearized maximum 
likelihood estimator, equation (^ ) with Z. In the case of the G)$ correlation {Z = Q), which is of interest for 
investigating the ISW effect, the numerator is cubic in 0, i.e. the maximum likelihood estimator for Aq, is the same 
as that computed from the bispectrum. 

As a final point, we note that for the 0$ correlation, the error — e in Z — Q is not entirely independent of the 
estimation procedure for $, since we are after all determining $ from the CMB temperature measurements. Since we 
assumed C to be uncorrelated with and its determination, this is a potential flaw in our calculations as applied to 
the 8$ correlation. We expect the error induced by this effect to be small, since the ISW effect is most important on 
the large scales where the instrument noise is small: G" <C Gf ®. We additionally note that the determination of <I> 
primarily uses information from much higher I. 
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TABLE I: Reference Parameters for CMB Experiments 



Experiment 


w-^''^ 1 2.725 K radian 


(7 / arcmin 


MAP (4 yr) 


5.6 X 10~" 


13 


Planck 


2.9 X 10"^ 


6 


high-res. 


5.0 X 10"^" 


1 
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FIG. 2: The solid line illustrates the model primary CMB temperature power spectrum /(/ + l)CP®/27r. The noise curves 
l{l + l)C"/27r are shown for MAP 4-year data (top, long-dashed), Planck (center, short-dashed), and the high resolution 
reference experiment (bottom, dotted). 



V. IMPLEMENTATION AND RESULTS 



In order to demonstrate the feasibility of computing the estimators above in a reahstic situation, and to assess their 
performance, we ran several simulations in which a data set was generated and analyzed. The data sets are generated 
on a full sphere assuming isotropic Gaussian temperature fluctuations, lensing potential, and instrument noise. For 
Planck and the high-resolution reference experiment (^max ~ 3500, beam FWHM = 1 arcminute), we reconstruct 
the lensing potential and compare the reconstruction and original map. The lensing power spectrum was estimated 
for the Planck- type experiment, but computer time constraints prevented a similar analysis for the high-resolution 
experiment. 



A. Utilities 



A lensing simulation requires the capability to work with maps on the unit sphere, or some subset thereof, partic- 
ularly the capability to perform the elementary algebraic and calculus operations and to perform convolutions and 
both forward and reverse spherical harmonic transforms (SHT). We therefore require the use of a map projection or 
grid. In order to perform SHT in a reasonable amount of time, we must use an isolatitude projection, i.e. one in 
which horizontal lines are parallels of the same latitude. Further more , we found conformality to be convenient for 
differentiation and useful for reducing gridding errors (see Section VB). The only projection with these properties is 
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FIG. 3: The solid line illustrates the model convergence power spectrum CI"' — + l)^C**/4. The noise curves 1^(1 + 
1)^(-F**)"V4 are shown for (top to bottom) MAP 4- year data, Planck, and the high-resolution reference experiment, using 
curvature matrix elements Fi computed from equation (^). 



the Mercator projection, in which the map coordinates x and y are related to the longitude (f) and colatitude 9 by the 
formulas 4> = t{x — xq) and cos9 = tanh(rj/). 

The conformal magnification F defined by ds^ = r^{dx^ + dy^) satisfies F = r sin6'. A map of some quantity A is 
stored as a two-dimensonal array A{x,y) of values at the points of integer x and y. Spherical harmonic transforms are 
performed by transforming first the longitude [x or 0) direction to produce the Fourier coefficients of A at constant 
latitude and then the latitude {y or 6) direction. On a grid with points, this is an 0{N^^^) process. Convolutions 
are performed with two successive SHT's. 



B. Estimator for Map of Lensing Potential 



Our implementation presently approximates the estimator ( pq ) as follows. The expectation value is ignored since 
it is expected to be small; see the discussion following (p5|). We must also approximate the vector 



(C^e - 1 e) A [$] VC^® A[$] - 1 (C®^ [$] ) - 1 e 



(92) 



and use it to determine the likelihood gradient G = V • V. (Note that G = ^ is a scalar function on ft.) Because 
of difficulties computing C^^Q in a reasonable amount of time, we chose to approximate G~^0 by a sequence of (i) 
filtering of A~^8 using the harmonic-space kernel G'^®/(G®® + G") and (u) convolution with the (G/)-i kernel. Of 
these steps, both break down near the boundaries and (ii) breaks down when the lensing is strong enough so that the 
noise G^' in O is no longer a good approximation to the noise in A^^8. We note that if C^^ were fiat (i.e. Z^G^^ (x P), 
this approximation would become exact far from the boundaries. ("Real" instrument errors show some increase in 
C'^'^ at high I due to finite beam size |14].) In order to reduce gridding errors, the (G/)~^ operation is performed by 
convolving with the kernel [l{l + l)Ci\^ and then taking the Laplacian. In order to avoid boundary effects, V is 
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multiplied by a function q that is equal to one inside Q far from the boundary, but falls off smoothly to zero at the 
boundary. 

After computing V, we take its divergence G = V • V; then we must determine C**G. In order to reduce errors due 
to the gridding (pixelization) , we perform the convolution in two steps. First, we apply an inverse Laplacian operator 
V~^, and then we apply the remainder of the convolution, l{l + 1)C**. Because we use a conformal coordinate 
system, the inverse Laplacian can be done in the plane where gridding errors vanish (the forward and reverse Fourier 
transforms are exact inverses of each other, even on a discrete grid, which does not occur for SHT). This is important 
since the low-/ modes of G, which correspond to the lensing modes that can be recovered at moderate signal-to-noise, 
are buried in high-/ noise due to the power spectrum PC^'^ ocw in the range of interest 50 < / < 1000. The inverse 
Laplacian operation does not add significant time to the computation because it utilizes a fast Fourier transform 
requiring 0{N \ogN) operations, whereas the computation time is dominated by SHT's requiring 
some attention paid to gridding issues, this two-step process may turn out to be unnecessary. 

An iterative procedure is needed for solving equation (p5|). The obvious iterative procedure, $„+i — G**G[$„], is 
(in the linear approximation) unstable for any lensing mode with SNR^ = C'^'^F > 2, and hence is not a good choice. 
We therefore use the underrelaxed version. 



$„+i = (1 -/)$„-!- /G**G[$„], (93) 

where / is a convergence parameter. In the linear approximation, convergence would require < / < 2/(1 4- SNR^), 
however, a smaller value of / is necessary in practice to avoid instabilities resulting from boundary effects and nonlinear 
lensing effects. 



C. Power Spectrum 

We use equation (|70| ) to estimate the power spectrum G**. The basis functions of choice have constant P(/ + 1)^G** 
within some band l^^i^ ^ I ^ ^max- The number of modes covered by the basis function Ca can be estimated as 



d„ = Am 'y^?1±1^^ [(/_ + 1)2 - /^J . (94) 
The estimator, equation (|70|), then can be written in the iterative form: 



/3 

(95) 

where Pa is the projector onto the band la,min < I < ^a.max, i-e. the operation that filters out all multipoles not 
included in this band. We use V^^G here because it and its spherical harmonic transform are already being computed 
for the estimation of the map of <E>. The parameter /? is an adjustable convergence parameter. The Fisher matrix Fa 
is computed by the Monte Carlo procedure: 



c: 



c: 



F- 



da 



-{V-^GfPaV-'^G 



Fa 



— ((V-2Go)^P„V-2Go), 

CLf-y 



(96) 



where the average is taken over an unlensed temperature field (including noise). 

Note that equation ( ^5|) exhibits a difference from the quadratic estimator, equation ([72|): while ( [72|) can, in principle, 
be negative. Pa is positive-definite and hence ( ^5|) can never yield any resu lt less tha n zero . It is straightforward to 
show that in this case, assuming the linearized approximation of Sections III C and |VC, and assuming a positive 
initial guess is used for the power spectrum to start the iteration, that (|9^) tends to zero (estimates no power) . Because 
negative results are replaced by zeroes, equation (|9^ ) technically converges to a biased estimator, with expectation 
value 



(gr(est.)) 
G** (actual) 



1 , X 
-eric —pz 

2 V2 



/27r 



3x 



(97) 
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where x = daC^^ Fa/2 is the squared signal-to-noise ratio (SNR^) in the power spectrum determination. If the signal- 
to-noise ratio is large {x ^ 1) then this bias is irrelevant. Note that in the context of maximum likelihood estimation, 
a negative power spectrum estimate does not make sense, because the corresponding probability distribution for $ 
and hence the likelihood integral, equation (|60|), are ill-defined. In particular, the avoidance of negative power spectra 
is not an artifact of any approximation wc have made. 

Obtaining convergence from the coupled iterative estimators, equations ( p3|) and (|95| ) requires some care. Conver- 
gence depends not only on the values of the parameters / and /3, but also on the pattern of how many times the map 
is updated using ( |9^ ) each time the power spectrum is updated using (|95|). As an extreme example, we note that if / 
and P are taken to be very small, and we alternate between updating the map and power spectrum, convergence can 
be expected only for negative whereas if we iterate the map many times between iterations of the power spectrum, 
convergence requires positive (3. After some experimentation, we found that iterating the map many (M ^ 1//) 
times between iterations of the power spectrum and taking /3 = 1 resulted in convergence. 



D. Improvements 



While the implementation described here is sufficient for evaluating the importance of non-linear effects, much work 
remains before it could be used to analyze real data. First, real data have boundaries (if for no other reason than 
the presence of a galactic plane cut) and usually have inhomogeneous noise. Thus, the operation used here will 
need to be performed by actual matrix inversion rather than by convolution. The latter also becomes necessary in 
the event of non-uniform noise. Also, the iteration of equation (95) converges slowly and for long- wavelength modes 
{l/l comparable to the size of the gridded region) may fail to converge entirely. 

Additionally, it would be desirable to use a better approximation to equation ( |60| ) than the Gaussian approximation, 
equation (p3[), but we were unable to identify a computationally tractable method of doing this. 



E. Results 



Here we investigate the effects of nonlinearity on future CMB experiments. We use the form for the instrument 
noise Q: 



where the weight w and beam full-width at half maximum a are parameters, and the beam spot is assumed to be 
Gaussian. We use a primary CMB power spectrum Cf ® generated by CMBFAST, assuming a flat LCDM universe 
with parameters Hq^72 km/s/Mpc, rcMB=2.725 K, Yhc=0.24, N^^3 (massless), f26=0.04, ricdm=0.30. The primary 
CMB model is shown in Figure |^. The lensing power spectrum, shown in Figure ^, is computed normalized to erg = 1. 

We compare the linearized estimator to the "full" nonlinear estimator (as implemented here) for two experiments: 
the upcoming Planck satellite mission, and the proposed Atacama Cosmology Telescope (ACT) as an example of 
upcoming high resolution, low noise experiments. The parameters for the Planck and the high-resolution reference 
experiments are shown in Table ||. (The MAP 4-year experiment is also shown for comparison.) For purposes of 
computational tractability, we have restricted ourselves to a small portion of the sky: the Planck simulation was run 
on a 751 x 751 grid with spacing at equator r = 5 x 10""* radians (even though Planck is an all-sky experiment), and the 
high-resolution simulation was run on a 1251 x 1251 grid with r = 1.5 x 10""* radians. (Note the ACT survey region is a 
long, rectangular stripe on the sky as opposed to a more compact patch. Because of our implementation's susceptibility 
to boundary effects, we cannot do our simulations on a stripe.) The solid angles covered by the simulations are 0.14 
sr (~ 2% of the Planck survey area) for the Planck-type experiment and 0.035 sr for the high-resolution experiment. 

The results of these simulations are shown in Figure ^. The conver gen ce map errors (i.e. Kest — '*) for oth the 
nonlinear estimator, equation (|9^) and the linear estimator, equation (js^) were computed. The convergence map 
errors were then Fourier-transformed (since we are working on a small patch of sky) , yielding the error amplitude ni 
for each Fou rier m ode. The modes were then sorted into bins of Al = 20 according to their Z-value, and an RMS 
amplitude •\/ k^k\ was computed for each bin. The ratios of these RMS amplitudes are plotted in Figure ^. Note that 
for the Planck experiment, there is only a slight advantage in using the nonlinear estimator, whereas for the high- 
resolution experiment, the accuracy of the reconstruction is improved by using the full nonlinear estimator, equation 

©• . . . , 

Both the comparison via simulation of the linear and nonlinear estimators (Figure m and the semi-analytic bias 

calculation (Figure |^) are methods of assessing the validity of the linear approximation. Both of them suggest that 
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Ratio of RMS errors for linear and nonlinear estimators 



1.05 



CD 



(0 
CD 



o 
o 



c 
o 
c 

"o 

o 

oi 



0.95 



0.9 



0.85 



0.8 



0.75 



£ 0.7 



0.65 



1 


1 1 


1 


1 

Planck 








high res. 




















,( 

;5 










/ \ 








v' 




: \ 






: 


1 


1 1 


1 


1 



100 



200 



300 

Multipole, L 



400 



500 



600 



FIG. 4: The ratio of the root mean squared error (VMSE) for the nonhnear estimator, equation (^), to that of the hnear 
estimator, equation (^), in bins of Ai=20. Results are obtained from a Monte Carlo simulation, which is responsible for the 
bumpiness of the graph. The solid line is for Planck parameters, and the dotted line is for the high-resolution experiment (see 
Table §). 

nonlinear effects are more important for the higher-resolution experiment than for Planck, but (at least for the 
experiment considered here) arc not dominant. Note, however, that for the high-resolution experiment the semi- 
analytic calculation found nonlinear effects to be more important at higher I, whereas the simulation found a greater 
improveme nt in switching to the nonlinear estimator at lower I. Note, however, that the semi-analytic al cal culations 
of Section IIIE and the simulation of this section are not measuring the same quantity: in Section IIIE wc were 
examining the bias of equation ( p^ ) , whereas here we are considering the mean squared error of the optimally filtered 
version of that estimator. 

We also simulated the performance of the linear [equation (^2|)] and nonlinear [equation (|95|), 16 iterations] conver- 
gence power spectrum estimators for Planck parameters. These were performed on the aforementioned 0.14 sr patch 
of sky, for seven I bins: 100-150, 150-200, 200-280, 280-360, 360-440, 440-520, and 520-600. The results are shown 
in Figure |5[ The (Monte Carlo) mean of each estimator, computed from n = 10 trials of area 0.14 sr each, are shown. 
Note the similar performance of the estimators except in the low-Z bands. Note that in the full Planck experiment 
(«8 sr), the error bars would be smaller by a factor of ~ yJlA/S. 

We may test both the linear and nonlinear estimators for bias using the t-test. The t statistic for band a is given 
by 



t[a] = 



SamplcMcan(C**) - C[ 



(99) 



where is the sample variance of the C**. The t-test resuUs are shown in Table ^ A positive t statistic indicates 
that we are overestimating the power spectrum, a negative t statistic indicates that we are underestimating it. The 
t statistics here are designated in the table because they have 9 degrees of freedom. Also shown in the table is 
the two-tailed p- value, i.e. the probability of a perfect t random variable (with 9 degrees of freedom) having absolute 
value exceeding ig: 
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Test of convergence power spectrum estimators 
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FIG. 5: The true convergence power spectrum, Cf^, is shown by the sohd hne. The points (with error bars) indicate estimated 
convergence power spectra from the linear (+ points) and nonlinear (x points) estimators (equations ( [t^ and (^5|) respectively). 
To prevent the error bars from overlapping and causing confusion, we have displaced the data point for the linear estimator 
slightly to the left and the data point for the nonlinear estimator slightly to the right. The error bars are the la Monte carlo 
error bars on the expectation value of the estimator. The estimated power spectra plotted are averages over 10 trials of 0.14 
sr solid angle each using Planck parameters, and thus shows the error bar on the power spectrum using data from a region of 
area A{^1) = 1.4 sr. 



r r(n/2 + l/2) / 

p = 2 / ' 1 + - dx. (100) 

J\tg\ V'^7rr(n/2) V n J 

If the power spectrum estimator is unbiased and normally distributed, the p- value for each I bin is uniformly distributed 
between and 1. (Warning: because, for each estimator, we derived the p- values for all the I bins from the same 
10 simulations, there is no reason to believe that the p-values are independent.) We note that, for the high-^ bins 
{I > 200), the linear and nonlinear estimators give similar results; both are consistent with being unbiased, although 
the nonlinear estimator shows a lower sample variance. (This is partially the result of negative power-spectrum 
estimates being set to zero by the nonlinear estimator.) In the two lowest-^ bins, the sample variance of the nonlinear 
estimator is enormous. We note that in some of our simulations, the nonlinear estimator assigned anomalously large 
(in one case > 4 x 10~^) values of C""' to these bins; this suggests a problem with the estimator. This may be due to 
smearing of the bins by the finite width of the scanned region (width wO.38 sr) or may represent a problem with the 
iterative procedure (e.g. convergence to a local maximum of the likelihood). 

Due to the excessive computation time requirements, we were unable to run a similar simulation of power spectrum 
estimators for the high-resolution experiment. Such a simulation would be interesting because the nonlinear lensing 
potential estimator showed improvements at the > 10% level over the quadratic estimator for this experiment. 
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TABLE II: Convergence Power Spectrum Estimators: t-test 
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VI. CONCLUSIONS 



Weak lensing of CMB temperature maps has been recongized for some time as a potential probe for mapping the 
mass distribution of the universe (in projection), and determining quantities derivable from such a map: its power 
spectrum and cross-correlation with the CMB or other maps. In the past several years, methods for carrying out 
this statistical analysis have been proposed [|[ 0| and dramatically improved We have shown, by comparison to 
likelihood-based approaches, that quadratic estimator for the lensing potential [equivalent to our equation (^6|)] is 
very close to optimal for the Planck experiment. That is, for this experiment, there is no hope of further reduction in 
the statistical noise of the lensing potential. Similarly, simulations of the fully nonlinear power spectrum estimator 
do not show much improvement over the linear version. 

If the lensing potential <I> can be treated as a Gaussian random field, and for experiments for which the linearized 
approximation suffices, then our maximum-likelihood analysis indicates that the CMB temperature bispectrum and 
trispectrum are optimal estimators for the temperature- lensing potential cross-correlation (C®*) and lensing power 
spectrum, respectively. There is, in this case, no additional information in the fifth-order and higher statistics of the 
CMB. These higher-order statistics may be useful if the lensing potential is non-Gaussian; for example, the matter 
density bispectrum will result in a similar bispectrum in the quadratic estimator, equation (|3^), i.e. it its effect will 
be seen in the CMB six-point correlation function. 

For the higher-resolution experiments, our results indicate that the residual error in the lensing potential maps 
can be reduced by switching from the quadratic estimator to a "full" likclihood-bascd estimator. In order to make 
this approach practical, further work will be needed to develop a version of the algorithm that works near the survey 
boundaries and to improve the stability of the algorithm. Other approaches to the functional integral in equation 
( |60| ) besides the Gaussian approximation used here, such as Markov chains, could be used. It is even possible that, 
due to use of a better approximation to the integral, the error can be reduced further. However, given that the 
high-resolution (1 arcminute beam) simulation only showed improvement in RMS error at the 10-20% level, it may 
be preferable to simply use the quadratic estimator for these experiments, accept this minor loss in signal-to-noise 
ratio, and avoid the difficulties associated with the nonlinear estimator. 

One problem for both the quadratic estimator approach Q and our likelihood-based approach are extremely sensitive 
to errors in the primary CMB power spectrum, Cf'^ . In the quadratic approach, this can be seen by noting that the 
power spectrum C** is obtained by differencing two quantities which may be very close to each other in equation (|7^) ; 
the inverse Fisher matrix may be tens of times greater than C** for Planck (see Figure H) and consequ ently the 
quantities being subtracted in (fz^ ) may differ by only several percent. In the likelihood approach of Section [v C| this 
problem is masked by the formalism of the iterative scheme, but it is still there. Note that this problem is more serious 
for the lower-resolution experiments. It is apparent that a lensing power spectrum analysis must be accompanied by 
extremely accurate determination of Cf^ , or an estimation scheme must be introduced that is robust against small 
errors in C®® must be introduced, or both. 

We have performed no analysis here of lensing estimators using the CMB polarization. In | pO| a quadratic estimator 
analysis for the CMB polarization has been performed. Because the polarization is a Gaussian random field whose 
covariance depends on the lensing potential (i.e. there are observed covariances C^^ , C^^ , etc. analogous to the 
used here), an analysis analogous to that of Sections III and IV establishes that the quadratic estimator approach 
is optimal if the lensing is sufficiently weak. Because our simulation code cannot handle polarization, we cannot 
determine whether realistic lensing is sufficiently weak or whether a nonlinear maximum-likelihood analysis is required 
to make full use of polarization data sets. Any high resolution polarization experiment will have one of its main goals 
gravity wave detection in B mode. Since weak lensing creates B modes out of E modes it is important to remove 
this contamination as well as possible by using the weak lensing reconstruction [^2| Given that polarization and 
its E and B decomposition is sensitive to the direction of polarization in addition to its amplitude, it may be more 
susceptible to the errors induced by the linearization procedure. In this case the nonlinear analysis will be essential 
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to exploit fully the potential of any future high resolution CMB polarization experiment. 
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APPENDIX: POTENTIALS, CONVERGENCE, AND PROJECTED DENSITY 

Here we sketch a derivation of equation (|^), and use this to relate the lensing potential <f> to quantities of more 
direct physical interest. We use a Robertson- Walker metric with a Newtonian perturbation (i.e. a weak perturbation 
jvPl <C 1 induced by nonrelativistic matter): 



ds^ = [-(1 - 2*)dT2 + (1 + 2^){dr^ + S{r)^duj^)] (A.l) 

where a is a function of the conformal time r, and 'J is the gravitational potential, generally a function of all the 
coordinates. The comoving distance is r, and w € 5^ is a direction on the unit sphere with the usual line element 
du)^. We have used the sinelike function S{r) = k^^^^ sm{k^/'^r), and will use its derivative, the cosinelike function 
C(r) = cos(fc^/^r), and their ratio T{r) — S{r)/C{r), where k is the spatial curvature. We use as an initial condition 
T = at present, and normalize a(r = 0) = 1. The simplest way to find the photon deflection is to consider the 
conformal metric (which must have exactly the same null geodesies): 

dS^ = -(1 - A^)dT^ + dr^ + S{rfdLu^ (A.2) 
In this metric we compute for the null geodesies (to linear order in ^' and assuming that the geodesic is nearly radial) : 



dr2 S{r) dr duj ^ ' 

In the Born approximation, where the gradient over to is evaluated along the unperturbed linc-of-sight, this is an 
inhomogeneous linear equation in n which can be solved by the Green's function method to find uj at the last 
scattering surface. The result is that the null geodesic arriving at "us" (r = r = 0) from direction n is found to have 
originated in direction n + V$(n), where: 

$(n) = -2 / dr*(rn, -r) ( ) (A.4) 

^ ' Jo ^ ' \T{r) Tins)) ^ ' 

Through the use of the trigonometric identities and their hyperbolic counterparts we can show that: 



T{r) Tins) S{r)S{ns) ^ ' 



with which equation (A.4) can be shown to be equivalent to the forms provided by, e.g. |g, 

Since the lensing potential $ is represented here as a projected gravitational potential, it would make sense that its 
second derivative n = — ^V^^ would represent a projected density perturbation. This is indeed the case, although 
there are other contributions to k. If we define A to be the comoving three-dimensional Laplacian (i.e. on dr^ + 
S{r)'^duj^), as distinguished from the two-dimensional Laplacian on the unit sphere, we have the usual relation for 
A: 
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If we solve this relation for V^, we can split k 
involving the radial operator: 



'^V^$ into two terms: 



one involving the A operator and one 



1 



fJ: 

' \T{r) Tins) 



(A.7) 



The first term can be replaced with a density using Poisson's equation, thus generating a projected density. To study 
the second term, we replace the partial derivative over r (at constant r) with a total derivative along the line of sight 
and a time derivative. The time derivative can be neglected he re if the matter is nonrelativistic. [Indeed we have 
already made this assumption implicitly when we write equation ( AJ_).] Next integrate by parts so that the d/dr acts 
on \/T{r) — 1/T(ris). (Since 5'(0) = 0, the surface terms gene rated by the integration by parts will vanish.) Then we 
use the identity ^yrpr = — l/5'(r)^ to convert equation (A.7) into: 



K(n) = 47rG 



N 



f 

Jo 



dr 



1 



1 



T{r) T{ru) 



S{r)'^a{-r)^Sp{rn, -r) - "^{n.n, -r^) + ^'(0) 



(A.5 



where Gat is the universal gravitation constant and Sp is the density perturbation. Note that convergence can be 
broken into two components: a component due to the density fluctuations Sp along the line of sight, and a component 
due to the potential difference between the source and observer. The second component is due to tidal forces acting 
to separate the trajectories of CMB photons; conceptually, it has the same origin as the compression of the sky into 
a small solid angle near the zenith as seen by an observer near a black hole despite the absence of any mass-energy 
along the line of sight. 

Finally, we attempt to determine the magnitude of the potential-difference contribution to the convergence. Here 
^'(0) is a constant (isotropic) and can be removed by a gauge transformation, so we do not consider it further. The 
term ^(rj^) can be estimated based on the CMB temperature fluctuation using the Sachs- Wolfe relation \E'(r;s) = 30 
I prt ; thus the contribution of the ^'(r/s) term to k is —30. Since (7~3e,-3e _ g^^eo jg always at least a factor of P 
less than the overall power spectrum , its effect on the convergence power spectrum is subdominant with respect 
to cosmic variance. 
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